Picture for Baining Guo

Baining Guo

Real-Time Generation of Streamable Talking Portrait Video with Reference-Guided Deep Compression VAEs

Add code
Jun 01, 2026
Viaarxiv icon

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

Add code
May 20, 2026
Viaarxiv icon

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

Add code
May 12, 2026
Viaarxiv icon

Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training

Add code
Feb 12, 2026
Viaarxiv icon

MobileManiBench: Simplifying Model Verification for Mobile Manipulation

Add code
Feb 05, 2026
Viaarxiv icon

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Add code
Feb 02, 2026
Viaarxiv icon

Controlled LLM Training on Spectral Sphere

Add code
Jan 13, 2026
Viaarxiv icon

LoLA: Long Horizon Latent Action Learning for General Robot Manipulation

Add code
Dec 23, 2025
Viaarxiv icon

VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image

Add code
Dec 16, 2025
Viaarxiv icon

Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

Add code
Oct 24, 2025
Figure 1 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 2 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 3 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 4 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Viaarxiv icon